non-convex optimization problem
Successive Affine Learning for Deep Neural Networks
This paper introduces a successive affine learning (SAL) model for constructing deep neural networks (DNNs). Traditionally, a DNN is built by solving a non-convex optimization problem. It is often challenging to solve such a problem numerically due to its non-convexity and having a large number of layers. To address this challenge, inspired by the human education system, the multi-grade deep learning (MGDL) model was recently initiated by the author of this paper. The MGDL model learns a DNN in several grades, in each of which one constructs a shallow DNN consisting of a relatively small number of layers. The MGDL model still requires solving several non-convex optimization problems. The proposed SAL model mutates from the MGDL model. Noting that each layer of a DNN consists of an affine map followed by an activation function, we propose to learn the affine map by solving a quadratic/convex optimization problem which involves the activation function only {\it after} the weight matrix and the bias vector for the current layer have been trained. In the context of function approximation, for a given function the SAL model generates an expansion of the function with adaptive basis functions in the form of DNNs. We establish the Pythagorean identity and the Parseval identity for the system generated by the SAL model. Moreover, we provide a convergence theorem of the SAL process in the sense that either it terminates after a finite number of grades or the norms of its optimal error functions strictly decrease to a limit as the grade number increases to infinity. Furthermore, we present numerical examples of proof of concept which demonstrate that the proposed SAL model significantly outperforms the traditional deep learning model.
- Education (1.00)
- Materials > Chemicals > Industrial Gases > Liquified Gas (0.67)
- Materials > Chemicals > Commodity Chemicals > Petrochemicals > LNG (0.67)
- (2 more...)
Bounding Optimality Gaps for Non-Convex Optimization Problems: Applications to Nonlinear Safety-Critical Systems
Akella, Prithvi, Ames, Aaron D.
Efficient methods to provide sub-optimal solutions to non-convex optimization problems with knowledge of the solution's sub-optimality would facilitate the widespread application of nonlinear optimal control algorithms. To that end, leveraging recent work in risk-aware verification, we provide two algorithms to (1) probabilistically bound the optimality gaps of solutions reported by novel percentile optimization techniques, and (2) probabilistically bound the maximum optimality gap reported by percentile approaches for repetitive applications, e.g. Model Predictive Control (MPC). Notably, our results work for a large class of optimization problems. We showcase the efficacy and repeatability of our results on a few, benchmark non-convex optimization problems and the utility of our results for controls in a Nonlinear MPC setting.
RMSprop: A Powerful Optimization Algorithm for Neural Networks
In the field of machine learning, optimizing neural network models is a crucial task to achieve high performance in various applications such as image recognition, natural language processing, and speech recognition. One of the popular optimization algorithms used for this task is RMSprop. In this article, we will explore RMSprop in detail, including its concept, math, implementation, and comparison with other algorithms. RMSprop is a variant of gradient descent, which is one of the most common optimization algorithms used for training neural networks. It was first introduced by Geoffrey Hinton in his Coursera course on neural networks in 2012.
How Non-Convex Optimization works part2(Machine Learning)
Abstract: In this paper, we propose a weak approximation of the reflection coupling (RC) for stochastic differential equations (SDEs), and prove it converges weakly to the desired coupling. In contrast to the RC, the proposed approximate reflection coupling (ARC) need not take the hitting time of processes to the diagonal set into consideration and can be defined as the solution of some SDEs on the whole time interval. Therefore, ARC can work effectively against SDEs with different drift terms. As an application of ARC, an evaluation on the effectiveness of the stochastic gradient descent in a non-convex setting is also described. Abstract: The online optimization problem with non-convex loss functions over a closed convex set, coupled with a set of inequality (possibly non-convex) constraints is a challenging online learning problem.
A Particle-based Sparse Gaussian Process Optimizer
Bajaj, Chandrajit, Vaidya, Omatharv Bharat, Wang, Yi
Task learning in neural networks typically requires finding a globally optimal minimizer to a loss function objective. Conventional designs of swarm based optimization methods apply a fixed update rule, with possibly an adaptive step-size for gradient descent based optimization. While these methods gain huge success in solving different optimization problems, there are some cases where these schemes are either inefficient or suffering from local-minimum. We present a new particle-swarm-based framework utilizing Gaussian Process Regression to learn the underlying dynamical process of descent. The biggest advantage of this approach is greater exploration around the current state before deciding a descent direction. Empirical results show our approach can escape from the local minima compare with the widely-used state-of-the-art optimizers when solving non-convex optimization problems. We also test our approach under high-dimensional parameter space case, namely, image classification task.
- Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.51)
PH.D. STUDENTS' TOP 10 INTERESTING ML DISSERTATIONS
Candidates for doctoral degrees are greatly driven to select research areas that open up fresh, inventive avenues for scientific advancement. It is difficult to choose and work on a machine learning dissertation topic because machine learning relies on statistical techniques to instruct computers to perform particular tasks without explicit programming. The primary goal of machine learning is to build intelligent machines that can function and think like people. The top 10 ML dissertations for Ph.D. candidates to attempt in 2022 are presented in this article. Text Mining and Text Classification Text mining is an AI technology that uses NLP to transform the free text in documents and databases into normalized, structured data suitable for analysis or to drive ML algorithms.
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.37)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.35)
Provable Non-Convex Optimization and Algorithm Validation via Submodularity
Submodularity is one of the most well-studied properties of problem classes in combinatorial optimization and many applications of machine learning and data mining, with strong implications for guaranteed optimization. In this thesis, we investigate the role of submodularity in provable non-convex optimization and validation of algorithms. A profound understanding which classes of functions can be tractably optimized remains a central challenge for non-convex optimization. By advancing the notion of submodularity to continuous domains (termed "continuous submodularity"), we characterize a class of generally non-convex and non-concave functions -- continuous submodular functions, and derive algorithms for approximately maximizing them with strong approximation guarantees. Meanwhile, continuous submodularity captures a wide spectrum of applications, ranging from revenue maximization with general marketing strategies, MAP inference for DPPs to mean field inference for probabilistic log-submodular models, which renders it as a valuable domain knowledge in optimizing this class of objectives. Validation of algorithms is an information-theoretic framework to investigate the robustness of algorithms to fluctuations in the input/observations and their generalization ability. We investigate various algorithms for one of the paradigmatic unconstrained submodular maximization problem: MaxCut. Due to submodularity of the MaxCut objective, we are able to present efficient approaches to calculate the algorithmic information content of MaxCut algorithms. The results provide insights into the robustness of different algorithmic techniques for MaxCut.
- Europe > Switzerland > Zürich > Zürich (0.04)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- (5 more...)
10 Compelling Machine Learning Dissertations from Ph.D. Students
This dissertation proposes efficient algorithms and provides theoretical analysis through the angle of spectral methods for some important non-convex optimization problems in machine learning. Specifically, the focus is on two types of non-convex optimization problems: learning the parameters of latent variable models and learning in deep neural networks. Learning latent variable models is traditionally framed as a non-convex optimization problem through Maximum Likelihood Estimation (MLE). For some specific models such as multi-view model, it's possible to bypass the non-convexity by leveraging the special model structure and convert the problem into spectral decomposition through Methods of Moments (MM) estimator. In this research, a novel algorithm is proposed that can flexibly learn a multi-view model in a non-parametric fashion.
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.90)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.62)
- (2 more...)
A Feature Selection Algorithm Based on the Global Minimization of a Generalization Error Bound
A novel linear feature selection algorithm is presented based on the global minimization of a data-dependent generalization error bound. Feature selection and scaling algorithms often lead to non-convex optimization problems, which in many previous approaches were addressed through gradient descent procedures that can only guarantee convergence to a local minimum. We propose an alternative approach, whereby the global solution of the non-convex optimization problem is derived via an equivalent optimization problem. Moreover, the convex optimization task is reduced to a conic quadratic programming problem for which efficient solvers are available. Highly competitive numerical results on both artificial and real-world data sets are reported.
- Asia > Middle East > Israel > Haifa District > Haifa (0.04)
- North America > United States > Wisconsin (0.04)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
A Feature Selection Algorithm Based on the Global Minimization of a Generalization Error Bound
A novel linear feature selection algorithm is presented based on the global minimization of a data-dependent generalization error bound. Feature selection and scaling algorithms often lead to non-convex optimization problems, which in many previous approaches were addressed through gradient descent procedures that can only guarantee convergence to a local minimum. We propose an alternative approach, whereby the global solution of the non-convex optimization problem is derived via an equivalent optimization problem. Moreover, the convex optimization task is reduced to a conic quadratic programming problem for which efficient solvers are available. Highly competitive numerical results on both artificial and real-world data sets are reported.
- Asia > Middle East > Israel > Haifa District > Haifa (0.04)
- North America > United States > Wisconsin (0.04)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)